optimistic rate
- North America > United States > Maryland > Baltimore (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > India > Tripura (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > India > Tripura (0.04)
Optimistic Rates for Multi-Task Representation Learning
We study the problem of transfer learning via Multi-Task Representation Learning (MTRL), wherein multiple source tasks are used to learn a good common representation, and a predictor is trained on top of it for the target task. Under standard regularity assumptions on the loss function and task diversity, we provide new statistical rates on the excess risk of the target task, which demonstrate the benefit of representation learning. Importantly, our rates are optimistic, i.e., they interpolate between the standard $O(m^{-1/2})$ rate and the fast $O(m^{-1})$ rate, depending on the difficulty of the learning task, where $m$ is the number of samples for the target task. Besides the main result, we make several new contributions, including giving optimistic rates for excess risk of source tasks (multi-task learning (MTL)), a local Rademacher complexity theorem for MTRL and MTL, as well as a chain rule for local Rademacher complexity for composite predictor classes.
- North America > United States > Iowa > Johnson County > Iowa City (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- North America > United States > Maryland > Baltimore (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Optimistic Rates for Multi-Task Representation Learning
We study the problem of transfer learning via Multi-Task Representation Learning (MTRL), wherein multiple source tasks are used to learn a good common representation, and a predictor is trained on top of it for the target task. Under standard regularity assumptions on the loss function and task diversity, we provide new statistical rates on the excess risk of the target task, which demonstrate the benefit of representation learning. Importantly, our rates are optimistic, i.e., they interpolate between the standard O(m {-1/2}) rate and the fast O(m {-1}) rate, depending on the difficulty of the learning task, where m is the number of samples for the target task. Besides the main result, we make several new contributions, including giving optimistic rates for excess risk of source tasks (multi-task learning (MTL)), a local Rademacher complexity theorem for MTRL and MTL, as well as a chain rule for local Rademacher complexity for composite predictor classes.
Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression
Zhou, Lijia, Koehler, Frederic, Sutherland, Danica J., Srebro, Nathan
We study a localized notion of uniform convergence known as an "optimistic rate" (Panchenko 2002; Srebro et al. 2010) for linear regression with Gaussian data. Our refined analysis avoids the hidden constant and logarithmic factor in existing results, which are known to be crucial in high-dimensional settings, especially for understanding interpolation learning. As a special case, our analysis recovers the guarantee from Koehler et al. (2021), which tightly characterizes the population risk of low-norm interpolators under the benign overfitting conditions. Our optimistic rate bound, though, also analyzes predictors with arbitrary training error. This allows us to recover some classical statistical guarantees for ridge and LASSO regression under random designs, and helps us obtain a precise understanding of the excess risk of near-interpolators in the over-parameterized regime.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California (0.04)
- (3 more...)
On aggregation for heavy-tailed classes
We introduce an alternative to the notion of `fast rate' in Learning Theory, which coincides with the optimal error rate when the given class happens to be convex and regular in some sense. While it is well known that such a rate cannot always be attained by a learning procedure (i.e., a procedure that selects a function in the given class), we introduce an aggregation procedure that attains that rate under rather minimal assumptions -- for example, that the $L_q$ and $L_2$ norms are equivalent on the linear span of the class for some $q>2$, and the target random variable is square-integrable.
- Oceania > Australia > Australian Capital Territory > Canberra (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
Online Nonparametric Regression
Rakhlin, Alexander, Sridharan, Karthik
We establish optimal rates for online regression for arbitrary classes of regression functions in terms of the sequential entropy introduced in (Rakhlin, Sridharan, Tewari, 2010). The optimal rates are shown to exhibit a phase transition analogous to the i.i.d./statistical learning case, studied in (Rakhlin, Sridharan, Tsybakov 2013). In the frequently encountered situation when sequential entropy and i.i.d. empirical entropy match, our results point to the interesting phenomenon that the rates for statistical learning with squared loss and online nonparametric regression are the same. In addition to a non-algorithmic study of minimax regret, we exhibit a generic forecaster that enjoys the established optimal rates. We also provide a recipe for designing online regression algorithms that can be computationally efficient. We illustrate the techniques by deriving existing and new forecasters for the case of finite experts and for online linear regression.